# DATA PROCESSING - Neuroinflammation and Neurodegenerative Disorders: Integrated Molecular Biomarkers, Neuroimaging and Clinical Progression Data for Alzheimer's, Parkinson's and Multiple Sclerosis

**DOI:** https://doi.org/10.7910/DVN/X2TQQA  
**Author:** de la Serna, Juan Moises  

---

## Processing Pipeline

### 1. Raw Data Ingestion
- Raw data downloaded from official sources
- Files stored in original format
- Source URLs and download dates documented

### 2. Data Cleaning
- Removed duplicate records
- Standardized country names to ISO 3166-1 alpha-3
- Handled missing values (coded as NA)
- Normalized numeric formats

### 3. Data Transformation
- Converted to standard tab-separated values format
- Added derived variables where applicable
- Merged data from multiple sources

### 4. Validation
- Cross-checked values against independent sources
- Statistical range checks performed
- Temporal consistency verified

### 5. Output
- Final files exported as .tab (Dataverse) and .csv
- Documentation files generated
- Dataset packaged for Harvard Dataverse submission

## Software Used
- R version 4.3+ (data.table, tidyverse)
- Python 3.10+ (pandas, numpy)
